Estimating Policy Functions in Payments Systems Using Reinforcement Learning
نویسندگان
چکیده
Nous montrons que les techniques d’apprentissage par renforcement permettent d’estimer fonctions de reaction optimale des banques qui participent aux systemes paiement grande valeur – un jeu strategique du monde reel caracterise informations incompletes.
منابع مشابه
Reinforcement Using Supervised Learning for Policy Generalization
Applying reinforcement learning in large Markov Decision Process (MDP) is an important issue for solving very large problems. Since the exact resolution is often intractable, many approaches have been proposed to approximate the value function (for example, TD-Gammon (Tesauro 1995)) or to approximate directly the policy by gradient methods (Russell & Norvig 2002). Such approaches provide a poli...
متن کاملPayments systems and monetary policy
A dynamic spatial model is constructed where there is a role for money and for centralized payments arrangements, and where there are aggregate fluctuations driven by fluctuations in aggregate productivity. With decentralized monetary exchange and no centralized payments arrangements, there is price level indeterminacy, and the equilibrium allocation is inefficient. A private clearinghouse arra...
متن کاملTransfer of task representation in reinforcement learning using policy-based proto-value functions
Reinforcement Learning research is traditionally devoted to solve single-task problems. This means that, anytime a new task is faced, learning must be restarted from scratch. Recently, several studies have addressed the issues of reusing the knowledge acquired in solving previous related tasks by transferring information about policies and value functions. In this paper we analyze the use of pr...
متن کاملReinforcement Learning Based PID Control of Wind Energy Conversion Systems
In this paper an adaptive PID controller for Wind Energy Conversion Systems (WECS) has been developed. Theadaptation technique applied to this controller is based on Reinforcement Learning (RL) theory. Nonlinearcharacteristics of wind variations as plant input, wind turbine structure and generator operational behaviordemand for high quality adaptive controller to ensure both robust stability an...
متن کاملOn-policy concurrent reinforcement learning
When an agent learns in a multiagent environment, the payoff it receives is dependent on the behavior of the other agents. If the other agents are also learning, its reward distribution becomes non-stationary. This makes learning in multiagent systems more difficult than singleagent learning. Prior attempts at value-function based learning in such domains have used offpolicy Q-learning that do ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Social Science Research Network
سال: 2022
ISSN: ['1556-5068']
DOI: https://doi.org/10.2139/ssrn.4226484